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Abstract —A blind compressive sensing algorithm is proposed 
to reconstruct hyperspectral images from spectrally-compressed 
measurements. The wavelength-dependent data are coded and 
then superposed, mapping the three-dimensional hyperspectral 
datacube to a two-dimensional image. The inversion algorithm 
learns a dictionary in situ from the measurements via global- 
local shrinkage priors. By using RGB images as side information 
of the compressive sensing system, the proposed approach is 
extended to learn a coupled dictionary from the joint dataset 
of the compressed measurements and the corresponding RGB 
images, to improve reconstruction quality. A prototype camera 
is built using a liquid-crystal-on-silicon modulator. Experimental 
reconstructions of hyperspectral datacubes from both simulated 
and real compressed measurements demonstrate the efficacy of 
the proposed inversion algorithm, the feasibility of the camera 
and the benefit of side information. 

Index Terms —Compressive sensing, hyperspectral image, side 
information, Bayesian shrinkage, dictionary learning, blind com¬ 
pressive sensing, computational photography, coded aperture 
snapshot spectral imaging (CASSI), spatial light modulation. 

I. Introduction 

Hyperspectral imaging techniques have been widely applied 
in various fields such as astronomy [1], remote sensing [2], 
and biomedical imaging [3]. Unlike ordinary imaging sys¬ 
tems, hyperspectral imagers capture a three-dimensional (3D) 
datacube, i.e., a 2D array of vectors that contain the spectral 
information at each spatial location. Inspired by compressive 
sensing (CS) [4], [5], researchers have adopted joint sensing 
and reconstruction paradigms that measure a subset of the 3D 
spectral datacube [6]—[9] and utilize CS inversion algorithms 
to retrieve a 3D estimate of the underlying hyperspectral im¬ 
ages. This sensing paradigm follows the traditional benefits of 
CS - reduced data rates and system complexity at the expense 
of computational algorithmic development and postprocessing. 

The coded aperture snapshot spectral imaging (CASSI) [7], 
[8] systems are examples of CS hyperspectral imaging sys¬ 
tems. The CASSI paradigm encodes each of the datacube’s 
spectral channels with a unique 2D pattern, which is the 
underlying operating principle behind code division multiple 
access (CDMA). CASSI systems form an image onto a coded 
aperture placed at an intermediate image plane to spatially 
modulate the datacube with high-frequency patterns (Fig. 1). 
A disperser placed in a pupil plane behind the coded aperture 
spectrally shifts the coded image, effectively granting each 
spectral channel onto its own unique coding pattern when mul¬ 
tiplexed onto a monochrome sensor. Unlike other hyperspectral 


imaging techniques [10]-[12], CASSI can obtain a discrete 3D 
estimate of the target spectral datacube from as little as a single 
2D measurement. Such operation renders the system’s forward 
model highly underdetermined; inversion requires use of CS 
algorithms [13]—[16]. Compressive hyperspectral imaging re¬ 
quires the signal under estimation to be sparse in a basis that 
is incoherent with the system sensing matrix [7]. The recon¬ 
struction is accomplished using optimization algorithms, such 
as gradient projection for sparse reconstruction (GPSR) [7] or 
two-step iterative shrinkage/thresholding (TwIST) [17]. GPSR 
assumes sparsity of the entire datacube in a fixed (wavelet) 
basis, while TwIST is based on a piecewise-constant spatial 
intensity model (when the total variation norm is used) for 
hyperspectral images. 

Distinct from the above optimization algorithms, blind CS 
algorithms [18] have been applied to CASSI systems by learn¬ 
ing dictionaries from the measurements. Blind CS inversion 
strategies seek to recover 3D patches of the hyperspectral 
datacube jointly with the shared dictionary (inferred from 
the measurements). Each patch is a sparse combination of 
the dictionary atoms. Since the dictionary is unknown a 
priori , this is called blind CS [19]. This blind CS model 
shares statistical strengths among similar image patches at 
different spatial locations. Additionally, this CS approach is 
task-driven since the learned dictionary is appropriate for 
different tasks [20]. In this paper, we propose a new blind CS 
model that imposes compressibility [21], rather than sparsity, 
on the recovered dictionary coefficients. Specifically, we use 
global-local shrinkage priors [22]-[24] on the dictionary co¬ 
efficients for each patch, under a Bayesian dictionary learning 
framework. This setting drives the reconstructed dictionary 
coefficient vector to contain many small (i.e., close to zero) 
components while allowing for a few large components, but 
avoiding explicit sparsity enforcement. This model is feasible 
for high-dimensional hyperspectral signals since it can extract 
more information from the limited (in situ learned) dictionary 
atoms. 

The quality of the reconstructed hyperspectral images relies 
on the conditioning of the sensing matrix. More accurate 
recovery is possible with additional measurements of the 
same scene [17] or by using digital micromirror (DMD) 
arrays [25]; however, these methods increase system cost and 
energy consumption. This paper features a blind CS algorithm 
that employs the RGB image of the target scene (as side 
information) and demonstrates substantial improvements on 
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reconstruction quality with a single CAS SI measurement. 
Particularly, the proposed algorithm learns a joint dictionary 
from a single CAS SI measurement and the corresponding 
RGB image (of the same scene) and then reconstructs the 
hyperspectral datacube. Since the RGB image is strongly 
correlated with the hyperspectral images, this joint model will 
dramatically aid the dictionary learning algorithm, thereby 
improving the reconstruction. 

Furthermore, we propose a new camera using spatial-light 
modulation (SLM) [26] for an active coding compressive 
spectral imager, to jointly modulate the spatial and spectral 
information of the scene on the intermediate image plane. 
This technology differs from CAS SI and DMD-modulated 
CS imagers in that no dispersive element is required for 
multiplexing the spectral channels. 

The reminder of the paper is organized as follows. Section II 
reviews the mathematical model of CAS SI and introduces the 
new camera. Section III develops the new blind CS algorithm 
to reconstruct hyperspectral images. Experimental results are 
presented in Section IV, and Section V summarizes the paper. 


Detector 

array 



Fig. 1: Schematic of CASSI [8]. The imaging optics image 
the scene onto the coded aperture. The relay optics relay the 
image from the plane of the coded aperture to the detector 
through the dispersive element (figure from [8]). 


II. Hardware 

In this section, we first review the CASSI imaging process 
and then introduce the new camera. Finally, a shared mathe¬ 
matical model for both cameras is proposed. 

A. Review of CASSI 

A common CASSI design [7], [8], [17] in Fig. 1 adapts 
the available hardware to encode and multiplex the 3D spa- 
tiospectral information f(x,y; A) onto a 2D detector. This 
modulation process is based on physically shifting the hy¬ 
perspectral datacube multiplied by the binary coding pattern 
T(x,y) via chromatic dispersion. The coding pattern consists 
of an array of square, binary (100% or 0% transmission) 
apertures, which provide full or zero photon transmission. The 
camera’s disperser (in traditional hyperspectral CS [6], [27], 
this is a prism or diffraction grating) laterally displaces the 


image as a function of wavelength d( A —A c ) of the coded aper¬ 
ture pattern, where d( A) is the lateral wavelength-dependent 
shift and A c represents the disperser’s central wavelength. As 
previously mentioned, this coding process can be considered 
a form of CDMA, whereby each channel is modulated by an 
independent coding pattern on the detector plane. The detector 
integrates the spectrally-shifted planes along the spectral axis. 
Information can be recovered by separating channels based on 
their projected coding patterns. This multiplexing process can 
be represented as the following mathematical forward model: 

g{x,y) = J T(x+d(\-\ c ,y))f(x+d(\-\ c ,y; A))dA, (1) 

where g(x,y) is the continuous form of the detector measure¬ 
ment, representing the sum of dispersed coded images. Since 
the detector array has been spatially pixelated by the detector 
pixel pitch A, the discreatized detector measurement becomes: 

9rn, n = JJ g{x,y) rect(^- - TO, - n)dxdy. (2) 

Finally, the discrete measurement for each pixel can be illus¬ 
trated as: 

9m,n ^ ^ — l),n/(m+j> — l),n,j 3“ e m,n-> (3) 

j 

where T m?n is the spatially encoding pattern, fm,n,j represents 
the discretized spectral density of f(x,y, A), and e m?n is the 
noise. 

B. A New Camera: SLM-CASSI 

Here we use an SLM to encode the 3D spatial-spectral infor¬ 
mation on a 2D gray-scale detector - a similar CDMA strategy 
to CASSI for snapshot hyperspectral imaging. An SLM is an 
array of micro cells placed on a reflective layer; each cell 
has a nematic liquid crystal that adds pixel-level phase to the 
incident light as a function of voltage, thereby changing the 
polarization state on a per-pixel basis [26]. Each layer of the 
LC can be considered as a thin optical birefringence material; 
the orientation and the relative refractive index difference be¬ 
tween the fast and slow axes determines the effective birefrin¬ 
gence. Since most birefringent phase modulators are nominally 
sensitive to wavelength, this element can assign wavelength- 
dependent transmission patterns to multiplex every spectral 
channels for the compressive measurement. The hyperspectral 
slices can be separated from the coded data via CS inversion 
algorithms. 

1) Experimental Setup: A schematic of this SLM-based 
CASSI system is shown in Fig. 2. An objective lens (LI) 
images the scene from a remote distance, and then the 
collimation lens (L2) and imaging lens (L3) relay the im¬ 
age through the polarizing beamsplitter and the achromatic 
quarter-wave retarder onto the SLM. The retarder serves 
to increase the contrast of the SLM polarization array by 
compensating the extra phase retardation in the liquid crystal 
device. The spatiospectral coding function is implemented on 
the SLM as an 8-bit, pseudo random, pixel-level grayscale 
phase retardation pattern. The phase retardation manifests 
as wavelength-dependent ampitude modulation upon re-entry 
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Fig. 2: The system schematic of CASSI with SLM. The parts 
in the dashed box represent ongoing work. 


through the polarizing beamsplitter toward the camera. Finally, 
the modulated image is projected by the imaging lens (L4) 
onto the detector plane and then recorded by the detector. The 
following mathematical forward model can be used to describe 
the multiplexing process of the compressive sampling: 

9{x,y) = J T(x,y,\)f(x,y; A)dA, (4) 

where T(x,y, A) are the wavelength dependent transmission 
patterns provided by the SLM, which can be calibrated by 
analyzing its electrically-controlled birefringence. Since the 
detector array is spatially pixellated, the ( m,n) th detector 
measurement is given by: 

9m,n = ^ ^ Tm,n,j fm,n,j T" ^ m,n-> (5) 

j 

where T m?n j represents the discretized transmission patterns. 
Fig. 3 is a photograph of the experimental setup. The setup in¬ 
cludes a 60 mm objective lens (Jenoptik), a 75 mm achromatic 
relay lens (Edmound Optics), a polarization beam splitter 
(Newport), an achromatic quarter wave plate (Newport), a liq¬ 
uid crystal based SLM (Pluto, Holoeye), two 75 mm imaging 
lenses (Pentax), and a monochrome CCD camera (Pike F-421, 
AVT) with 2048x2048 pixels that are 7.4 fim square. A 450- 
680nm band pass filter (Badder) is mounted on the objective 
lens to block unwanted ultra-violet and infrared light (the SLM 
and camera are optimized for visible-range operation). The 
SLM provides strong modulation and effective multiplexing to 
fulfill the requirement of compressive sensing. This modulator 
is based on a reflective Liquid Crystal on Silicon (LCoS) 
microdisplay technique[26], which has a 1920 x 1080-pixel 
active area with an 8 fi m pixel pitch. 

2) System Calibration: The effect of the 8-bit, pseudo¬ 
random voltage pattern on the spectral channels can be es¬ 
timated by recording the system response generated by the 
SLM. Fig. 4 records the transmission under the monochro¬ 
matic illumination and homogeneously-applied voltage on the 
modulator. We average the transmission across the center area 



Fig. 3: The experimental prototype of the system. 



Fig. 4: Spectral dependence of SLM amplitude modulation. 
Given different voltages and incident wavelengths (in nm), 
the SLM combined with the optics generates corresponding 
transmission code. Within 450 nm and 680 nm, an 8-bit volt¬ 
age variation on the SLM generates a gray scale transmission 
ratio between 0.08 and 0.96. 


of the SLM and calculate its relative transmisson. A potential 
advantage of this SLM is that we can change the code easily 
to get different modulations. 1 

We assume that the system operator provides one-to-one 
mapping (1:1 magnification of the SLM pixels onto the 
detector) between the micro-display and the detector array. 
Theoretically, the system operator T(x,y, A) can be estimated 
by using SLM’s calibration data and the applied voltage on 
the SLM. However, one might account the error in the real 
system projection. For example, optical aberrations and sub¬ 
pixel alignment discrepancies between the detector and SLM 
might break the ideal mapping and image some of the SLM 
pixels onto several detector pixels. Therefore, part of the 
transmission code might deviate from the ideal value, which 

! This also opens the door of adaptive compressive sensing [28]—[30]. 
However, after plenty experiments, we found the contribution of adaptive 
sensing is very limited in the CS inversion, partially due to the mismatch 
between the designed code and the calibration. We further found random 
coding performing well in our camera. 
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results in an inaccurate system operator. Since the system 
operator dominates the quality of the object estimation in 
this inverse problem, a careful calibration of the T(x,y, A) 
is required. 

A better representation of the system response can be 
acquired by recording the transmission pattern illuminated by 
each spectral band. The revised system operator T(x,y,\) 
includes all the possible coding patterns generated by the 
SLM, which can contribute to the data reconstruction. Here we 
combine the tungsten light source (LSH-T250, Horiba) with 
a monochromator (iHR320, Horiba) to quantize the spectral 
dimension into bands of finite width. Each band has a 7.5- 
8 nm full width at half maximum (FWHM). The scene’s 
spectral irradiance is recorded by a fiber optics spectrometer 
(USB2000, Ocean Optics); these values are taken as ground 
truth in the experiments presented later in the paper. The 
number of spectral channels and their central wavelengths are 
determined by the grating period (1800 periods/mm) and the 
optical path length of the monochromator. To better represent 
the continuous light source, two adjacent spectral bands are 
separated by the monochromator’s FWHM. At this calibration 
resolution, the system’s spectrally-sensitive bandwidth (450- 
680nm) has been discretized into 30 spectral channels. Im¬ 
portantly, the spectral resolution of the reconstructed data is 
determined by the monochromator’s resolving power during 
calibration; smaller FWHM values result in larger numbers of 
calibrated and reconstructed spectral channels. Compared to 
the spectral imagers that are reliant upon dispersive elements to 
spatially shear the physical code (in CAS SI), this SFM based 
spectral imager can easily improve the spectral resolution 
without revising the main camera’s optical design. 


C. Shared Mathematical Model of CAS SI and SLM-CASSI 

Assuming we are measuring a hyper spectral datacube X £ 
RN x x N y x N\ ^ w fi ere n x signifies the number of spectral 
channels of spatial extent N x x N y . We can interpret both 
CAS SI and SFM-CASSI measurements as a summation of 
the 3D Hadamard product of the datacube with the different 
modulation codes, M = O Cj- Here, the C j is a 

shifted version of the same code as in CASSI [18]; however, 
each spectral modulation pattern of C j depends on the SFM 
response in SFM-CASSI. For each pixel, both (3) and (5) can 
be represented as: 


N x 

M(m, n) = ^ Xj(m, n)Cj(m, n). (6) 

3 = 1 

The pixel-wise modulation of CASSI results in feasible patch- 
based reconstruction algorithms. 

Since the SFM-CASSI sensing paradigm reconstructs N\ 
spectral images from a single measurement M, the compres¬ 
sion ratio of the CASSI and SFM-CASSI systems discussed 
above is N\. The compressive sampling process in both 
spectral imaging systems can be represented by the same 
matrix described in the following section. 


III. Reconstruct Hyperspectral Images with 
Blind Compressive Sensing 

In this section, we develop a new blind Bayesian CS model 
to reconstruct hyperspectral images from CASSI or SFM- 
CASSI measurements. The proposed model is a generalized 
framework which can also be used in denoising and inpainting 
problems, among others [31]. 


A. Dictionary Learning Model 

We decompose the 3D hyperspectral datacube into N small 
patches with size n x x n y x N\. In vector form, these patches 
are denoted x n £ M p ,Vn = 1, ... ,7V, with P = n x n y N\. 
We seek a dictionary D £ M PxP with K atoms such that 
x n = Ds n , based on the captured image M. For each 
patch, we have the corresponding 2D measurement patch 
{M n }^ =1 £ M n * xn ^ and the 3D mask (coding) patch 

{C n }^ =1 £ {0, i} n xxn y xN X ' Considering the vectorized 
format of the measurement for n th patch y n £ W lxU y, and the 
corresponding mask patches {W^}^ =1 £ {0, l} n * n ^ Vj = 
1,..., N\ (each is the vector form of j th 2D slice in 

C n ), we have the measurement model 


Vn 

^ n 

Y 


T~ (7) 

diag{^W},..., diag{ W^>}] , (8) 

yi,...,y N ], X* [xi,...,x N , (9) 


where e n represents the additive Gaussian noise, and we 
model it as e n ~ A/"(0, <r^" 1 Ip), with cto denoting the noise 
precision. 2 We place a diffuse gamma prior on a 0 


(T 0 ~ Ga(c 0 , d 0 ), (10) 

where (co,do) are hyperparameters. Taking account of the 
dictionary learning model, (7) can be written as: 

Vn = T~ (H) 

where s n is a vector of coefficients describing the decompo¬ 
sition of the signal x n in terms of dictionary atoms. Given 
the measurement set Y and the forward matrices 
we aim to jointly learn D, {s n }^ =1 , and the noise precision 
parameter <to to recover X. One key difference of our model 
compared to other blind CS work [19] is that each patch has 
a unique ^ n , which is inspired from our cameras since the 
mask is generated randomly and \F n also takes account of the 
system calibration. 

We model each dictionary atom as a draw from a Gaussian 
distribution, 

D = [di,..., cLk\ , 

d k - V(0,2lp), Vk=l,...,K. (12) 

2 The extension of our model to non-uniform denoising problem [32] is 
straightforward, i.e., by imposing spatially-varying noise models for different 
patches. 
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The coefficients Sk, n are assumed drawn from the marginalized 
distribution 


P{Sk,n\^~n, *&k,n) 



i n,0,T n 1 a 1 )InvGa(a; 1, (2$ fei „) 1 )da 


1 

2 




(13) 


where InvGa(-) denotes the inverse-gamma distribution. The 
parameter r n > 0 is a “global” scaling for all coefficients 
of the n th patch, and $k,n is a “local” weight for the kth 
coefficient of that patch. We place a gamma prior Ga(go, ho) 
on where one may set the hyperparameters (go, ho) to 

ensure that most $>k,n are small. This encourages most Sk jn 
to be small. By maximizing the log posterior to obtain a point 
estimate for the model parameters, one observes that the log of 
the prior in (13) corresponds to adaptive Lasso regularization 
[33]. 

Equivalently to (13), the model for Sk, n ma y be represented 
in the hierarchical form 


Sk,n l^n 5 ^k,n 

~ V(0 ,T n 1 a k ^ n ), 

(14) 

&k,n | *&k,n 

~ InvGa(l, (2$ fci „) _1 ), 

(15) 

r n 

~ Ga(a 0 , bo), 

(16) 

$k,n 

~ Ga(g 0 ,h 0 ), 

(17) 


where latent variables {ak, n } are included in the generative 
model, instead of marginalizing them out as in (13). A 
vague/diffuse gamma prior is placed on the scaling parameters 
r n . Despite introducing the latent variables {a^ ?n }, the form 
in (15) is convenient for computation, and with an appropriate 
choice of (go, ho), most {a^n} are encouraged to be large. 
The large ak, n corresponds to small s/c ?n ; this model imposes 
that most Sk, n are small, i.e. it imposes compressibility. Note 
that (14) is different from the model used in [34], where a 
single r is used for all patches; here, we impose different 
compressibility for each patch by inferring a unique r n , thus 
providing flexibility. 

In order to automatically infer the number of necessary 
dictionary atoms, we can replace (11) with 


Vn 


(18) 

A 

= diag[^i ,...,v K ], 

Vk ~ A/”(0, gp), (19) 

Vk 

K 

= m vfc=i >-- 

.,K; fjj ~ Ga(e 0 ,/o).(20) 


3 = 1 



The multiplicative gamma prior (MGP) [35] used above is 
developed to stochastically increase the precision pk as k 
increases. During inference, we observe that as k increases, 
Vk tends to zero. This results in an approximate ordering of Vk 
by weight. Based on this weight, we can infer the importance 
of the columns of D. The other use of the zVs is to avoid 
over-fitting during the learning procedure. We can update a 
fraction of D by the weight of Vk and discard the atoms with 
small weights to reduce the computational cost (refer to the 
Appendix for the inferred Vk and learned dictionary D). 


B. The Statistical Model 

The full statistical model of the proposed blind CS approach 
is: 


Vn 

~ A/^^SgDA. 5 n , cxq Ip), 

(21) 

D 

= [di,...,d.K\ , dk ~ A/"(0, — Ip), 

(22) 

Sk,n 


(23) 

&k,n 

~ InvGa(l, (2$ft >n ) _1 ), 

(24) 

A 

= diag^t,. ..,v K \, 

L 

(25) 

Vk 

K 

~ T] k ~ ~[[ //,. 

3 = 1 

~ G&(ao,bo), ao ~ Ga(co, do)i 

(26) 

T n 

(27) 

Vj 

~ Ga(e 0 ,/ 0 ), $k,n ~ Ga(g 0 ,h 0 ), 

(28) 

where broad priors are placed on the hyperparameters 
(a 0 ,bo,c 0 ,do,eo, fo,go,ho), i.e., a 0 = • • • = ho = 10 -6 . 
Note in (23), the coefficients are also scaled to the noise 


precision a 0 . 

Let 0 = {D, S, A, e} denote the parameters to be inferred. 
The log of the joint posterior may be expressed as: 

- logp(©|Y) 

= + PXM\l + log Ga(ao) 

(29) 

a 0 T n a k,n s k,n , n f \ 

+ --- + 2^ lo s Ga ( r «) 

n 

+53 io s Ga ($fc,n)+53 log invGa ( a fc,«; ) ( 30 ) 

k,n n,k ^ ,n 

+ + 53 log Ga(?7fc). (31) 

k 

The terms in (29) are widely employed in optimized-based 
dictionary learning [32], [36], [37]. The first term imposes an 
£2 fit between the model and the observed data y n and the 
second term imposes an i 2 regularization on the dictionary 
atoms. The third term regularizes the relative importance of 
the aforementioned two terms in (29) via the weighting a 0 
(updated during the inference). The terms in (30) impose both 
compressibility (via the global-shrinkage parameter r n and 
local-shrinkage parameter ak, n ) and an £2 regularization on 
Sk,n- Finally, the terms in (31) impose both £2 regularization 
and shrinkage on 1 /*.. 


C. Related Models 

The shrinkage manifested by (30) is the most distinct 
aspect of the proposed model relative to previous optimization- 
based (and Bayesian-based [18], [31]) approaches. These 
other approaches effectively impose sparsity through £\ reg¬ 
ularization on s n . This is done by replacing terms in (30) 
with 7 S ||||i* To solve an optimization-based analog to 
the proposed approach, one seeks to minimize the objective 
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function 

£ ( @ ) = «0 EnllPn-*n DA *«ll2 

+ — Nfclll +7s ^|| Sw || i; (32) 

n 

the parameters ao and 7 S are typically set by hand (e.g., via 
cross-validation). One advantage of the Bayesian framework 
is that we infer posterior distributions for a 0 and (in our 
model this is (r n , a^ ?n ), along with similar posterior estimates 
for all model parameters without cross-validation. We also 
note that if we replace (30) by the spike-slab prior as used in 
[31], the model will reduce to BPFA. As opposed to this spike- 
slab prior, which imposes sparsity directly, our shrinkage prior 
imposes compressibility, and it is more appropriate to the high 
dimensional hyperspectral image patches. The global-local 
shrinkage prior used here can extract more useful information 
from the limited dictionary atoms 3 . 

Other forms of shrinkage priors like Gaussian scale model 
and Laplace scale mode can be found in [38]—[41]. Aiming 
to better mimic the marginal behavior of discrete mixture 
priors, the global-local shrinkage priors [22], [23] have been 
developed to offer sufficient flexibility in high-dimensional 
settings, which inspires our model. 


D. Inference 

Due to local conjugacy, we can write the conditional poste¬ 
rior distribution for all parameters of our model in closed form, 
making the following Markov Chain Monte Carlo (MCMC) 
inference based on Gibbs sampling a straightforward proce¬ 
dure. 


1) Sampling dk'. 

p(dk \-) oc 


(33) 


= 


N 


-I -1 


PIp+aolstJ2 S ln*n*r, 


n= 1 


N 


n=l 

Vn-k = Vn *nDAs„ + '& n d k V k S k ,r 

2) Sampling s k>n : 

p(s k ,n\~) OC N{ps k , n ,0 2 s hn )-, 


(34) 




Ps k , n = a' 0 a 2 Skn is k d k T 'f’Ty n _ k . 


3) Sampling r n : 


/ 1 i K 

P( T n\—) oc Ga a 0 + ^K,b 0 + - ^4, 


n^k,n^ 0 


k=l 


(35) 


3 We did experiments of denoising and inpainting with benchmark color 
images and compared with K-SVD [32] and BPFA [31]. The results are shown 
in the appendix. The proposed algorithm constantly performs better than the 
above two methods. 


4) Sampling a k/ „ : 


p(a k , n \-) OC IG 




k,n 


o *&k,n 


(36) 


where IG(-) denotes the inverse-Gaussian distribution. 

5) Sampling 

p($fc,n|-) oc GlG(2h 0 ,a^ n ,g 0 - 1), (37) 

where GIG(x;a,6,p) is the generalized inverse Gaus¬ 
sian distribution 

GIG (a;; a, b,p) = - x p ~ l exp (~^(ax+ -)) , 

2 K p (Vab) V 2 x J 

and K p (6) is the modified Bessel function of the second 
kind 

/•oc -i / i q 2 \ 

K P (0) = jf -d-H?- 1 exp (--(t + —)J dt. 

6) Sampling ao: 

p(a 0 1-) oc Ga(ci,di); (38) 

Cl = C 0 + -J2\\*n\\o+ 2 KN , 

n ~ 

di = d 0 +^'^2\\y n -^ n DKs n \\ 2 2 

n 

+ E s k,n T n a k,ru 

n k 

where ||\P n ||o denotes the number of nonzero entries in 
4 F n , and refers to the conditioning parameters of the 
distributions. 


The update equations of MGP related parameters {okiVkiVk} 
can be found in [35] (also listed in the Appendix). In ap¬ 
plications where speed is important, we can use all condi¬ 
tional posteriors including those above to derive a variational 
Bayes (VB) inference algorithm for our model, which loosely 
amounts to replace conditioning variables with their corre¬ 
sponding moments. Details of the VB procedure can be found 
in the Appendix. 


E. RGB Images as Side Information 


While reconstructing the hyperspectral images is challeng¬ 
ing (the compression ratio is N\ : 1 when a single measure¬ 
ment is available), an RGB image of the same scene can be 
obtained easily by an off-the-shelf color camera in the unused 
path of the system (i.e. directly reflecting off the polarizing 
beamsplitter) as shown in Fig. 2. We here consider using the 
RGB image as side information to aid the reconstruction. 

In the case of an additional side RGB camera, the measure¬ 
ment is a joint dataset composed of the RGB image and the 
CASSI measurement; the compression ratio can be considered 
as (N\ + 3) : 4. The new measurement model is: 


Y 


^ 0 

X 

Y(rgb) 


0 I 

X(rgb) 


def-^v def^. 


(39) 
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TABLE I: Reconstruction PSNR (dB) of various scenes with different algorithms. Each feature shows the mean value and 
standard derivation of the PSNRs of reconstructed hyperspectral images across wavelength channels. 



where (Y^), X( r s b )} are the patch format RGB image. 
Considering each patch, 


’ Vn 


\Lr 
^ n 

0 ' 


x n 

. y£ eh) _ 


0 

I 


^.( r s b ) 


where is the vectorized form of the RGB image 

patch corresponding to {y n }n=i and similar for {a;i rsb ^ =1 . 
The dictionary learning method proposed in Section III-A is 
now performed on the joint dataset X = DS to learn the 
super-dictionary D G R PxK , where P = P + 3 n x n y . 

Given Y, the proposed model learns D and S simultane¬ 
ously. For the recovery of hyperspectral images, we select 
the top P rows from the product of D and S. Averaging the 
overlapping patches and reformatting the data to 3D give us the 
desired hyperspectral images. We aim to get both clear images 
at each wavelength and correct spectra for each pixel, which 
are both embedded in the dictionary atoms. Since the RGB 
images are fully observed and embed the same scene as the 
hyperspectral images, this side information will facilitate the 
algorithm to learn a better dictionary and therefore improving 
the reconstruction quality. 

Another way to use the RGB image is treating each R, G and 
B channel as a superposition of the hyperspectral images with 
different weights corresponding to the quantum efficiency of 
the R, G, and B sensors. However, these quantum efficiencies 
may be different for each camera. Here, we aim to propose 
a general and robust framework to collaborate RGB images 
with CASSI measurements. Therefore, the formulation in (39) 
is adopted. 

In our experimental setup, a grayscale image may also be 
used as side information. As the RGB camera is inexpensive 
and carrying richer spectral information, we only consider 
the RGB case in the following experiments. In the results 
presented below, we consider that RGB image is measured 
separately by an additional camera. We are now modifying 
our SLM-CASSI system to capture the RGB image and the 
compressed hyperspectral image simultaneously as illustrated 
in Fig. 2. 

IV. Experimental Results 

In this section, we verify the proposed blind CS inversion 
algorithm on diverse datasets. For now, we solely consider the 
case where a single measurement (i.e., the compressed hyper¬ 


spectral image) is available 4 . The proposed blind CS algorithm 
is compared with: i) two-step iterative shrinkage/thresholding 
(TwIST) [13] (using a 2D total variation norm on images 
at each wavelength 5 ), and ii) the linearized Bregman algo¬ 
rithm [15]. The linearized Bregman approach respectively 
regularizes the spectral dimension and spatial dimensions with 
t\ -norms of the discrete cosine transformation and wavelet 
transformation coefficients. Note that TwIST and linearized 
Bregman are global algorithms; it is necessary to load the 
entire data into RAM when performing inversion. Conversely, 
our proposed method is locally- (patch) based; the online 
learning is feasible and the inversion can be performed in 
batch-mode [42]. 

We employ a spatial patch size n x = n y = 8. The 
compression ratio, N\, depends on the dataset. The proposed 
model can learn the dictionary atom number K from the data. 
During experimentation, we have found that setting K = 64 
usually provides good results. We have verified that further 
increasing K does not significantly change the outcome of 
our methods. All code used in the experiments was written in 
Matlab and executed on a 3.3GHz desktop with 16GB RAM. 

We evaluate the proposed algorithm on synthetic data pre¬ 
sented in Section IV-A and the real data captured both by 
the CASSI [8], [17] and the proposed SLM-CASSI camera 
Section IV-B. We denote our method as “shrinkage” in the 
experiments (in figures and ta bles) as the shrinkage prior is 
used in our model. To evaluate the algorithm’s performance, 
we use the PSNR of the reconstructed images at different 
wavelengths and the correlation between the reconstructed 
spectrum and the true (reference) spectrum. 

The computational time of our model is similar to the BPFA 
model used in [18] and linearized Bregman. TwIST provides 
faster, but generally worse results than our algorithm in terms 
of the metrics mentioned above. Quantitatively, linearized 
Bregman and TwIST require about 20 minutes and 14 minutes, 
respectively, to reconstruct the bird data (size 768 x 1024 x 24). 


4 Please note that [18] did not show the algorithm proposed therein working 
with a single measurement. In this paper, we focus on the single measurement 
case and our model can readily be used in multiple measurements scenario. 
We also tried the method proposed in [18] and the results are worse than ours. 

5 We also tried the 3D total variation (TV) on the entire datacube, the results 
are a little bit worse than the 2D TV. Therefore, we only show the 2D TV 
results of TwIST. 
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Fig. 5: Reconstructed spectra with different algorithms com¬ 
pared to ground truth, “corr” denotes the correlation between 
the reconstructed spectra and the truth. 


A. Simulation Data 

We use hyperspectral images encoded with a random binary 
(Bernoulli(0.5)) mask to simulate the CASSI measurements. 
The RGB images are available and aligned well with the 
hyperspectral images. 

1) Nature scenes: We first consider the hyperspectral im¬ 
ages of natural scenes used in [43] 6 . There are N\ = 33 
channels (400-720nm with a lOnm interval) and we resize each 
image to N x = N y = 512. The PSNR of each reconstructed 
spectral channel with mean values and standard deviations 
(across wavelength channels) are presented for each algorithm 
in Table I. It can be seen that the proposed shrinkage method 
outperforms the others, especially once the RGB images 
are used as side information. Fig. 5 plots the reconstructed 
spectrum of selected blocks (the average spectrum of pixels 
inside the block is used) in the scene compared to the ground 
truth. It can be seen that: i) though TwIST provides the 
lowest PSNR, the reconstructed spectrum is usually correct; 
ii ) the spectrum reconstructed by the proposed algorithm is 
improved significantly when side information is provided; in) 
the linearized Bregman presents the worst spectrum, mainly 
due to the DCT used in the spectral domain for inversion. 



Fig. 6: Left: the hyperspectral images (ground truth), right: the 
RGB image. 


6 http://personalpages.manchester.ac.uk/staff/d.h.foster/Hyperspectral_ 

images_of_natural_scenes_04.html 


2) Bird data: Next we consider the 24-channel bird data 
(Fig. 6) measured by a hyperspectral camera in [18]. The 
RGB image is also available. Fig. 7 shows the reconstructed 
images using the proposed algorithm with and without side 
information. The left part of Fig. 8 plots the reconstruction 
PSNR of each channel using different algorithms. Again, our 
shrinkage blind CS method coupled with the RGB image 
provides the best result. The right part of Fig. 8 shows 
the reconstructed images of five selected channels using the 
different algorithms. It can be seen that the TwIST results are 
characterized by lost details (over smoothed) of the birds; our 
model’s results without using the RGB images appear noisy. 
Fig. 9 depicts the reconstructed spectra of different birds with 
different algorithms. It can be seen that both TwIST and our 
algorithm with the RGB image provide very good matches 
to the ground truth, while the proposed model without RGB 
images and linearized Bregman do not represent the spectrum 
well. 










Fig. 7: Reconstructed images without RGB image (top) and 
with RGB image (bottom). 


B. Real Data 

In this section, we apply the proposed algorithm to real data 
captured by our cameras (both the original CASSI camera and 
the new SLM-CASSI camera). 

1) Object data: We first demonstrate our algorithm on data 
taken by the original CASSI system [8] 7 . In these experiments, 

7 http ://www. disp. duke. edu/proj ects/ CAS SI/experimentaldata/index .ptml 
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454 nm ^■458nrn ^■462nrn ^■465nrn ^■468nrn ^■472nrn 


475 nm ^■479nrn ^■483nm^■487nm ^■491rirn ^■496nrn 


500 nm ^■505nm ^■509nrn ^■514nrn ^■520nrn ^■525nrn 


531 nm ^■537nrri ^■543nm ^■549nrn ^■bbbnrri^■564nrn 


571 nm ^■579nm ^■587nrn ^■596nm ^■605nrn ^■615nrn 


626 nm ^■637nrn ^■650nrn 


Fig. 10: Real data: measurement and reconstruction without side information (the RGB image). 


the reconstructions have 33 spectral channels (marked on the 
reconstruction in Fig. 10) and N x = N y = 256. There are 
4 objects in the scene, a red apple, a yellow banana, a blue 
stapler and a green pineapple. Fig. 10 shows the reconstruction 
of the proposed algorithm without the RGB image. Since the 
RGB image is not well-aligned with the CAS SI measurement, 
we don’t show the result with side information and we found 
the reconstruction with the RGB image (not aligned) looks 
similar to this one due to the simple scene used in this 
experiment. Fig. 11 compares selected reconstructed images 
inverted by different algorithms. It can be seen that the 
proposed algorithm provides more detail of the scene (notice 
the clear apple stem). Fig. 12 plots the reconstructed spectra of 
the four objects. TwIST [8] has yielded accurate spectra; our 


algorithm (without side information) provides similar results. 
The linearized Bregman algorithm does not reconstruct this 
real data well; we don’t not show its results in the following 
experiments. 

2) Bird data: We again consider the bird data, now cap¬ 
tured by the multiframe-CASSI camera [17]. The RGB image 
is aligned manually with the CAS SI measurement (N x = 
703, N y = 1021, N\ = 24). We plot the spectra in Fig. 13; 
the reconstructed images are shown in Fig. 14. It can be 
seen clearly that our proposed method with side information 
provides the best results with respect to both image clarity 
and spectral accuracy. Without the use of side information, 
the proposed model yields clear images but reconstructs the 
spectra poorly. This verifies the benefit of the side information 
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Fig. 8: Left: PSNRs of the reconstructed images at each 
wavelength using different methods. Right: selected channels 
of reconstructed images compared to truth. 



Fig. 9: Reconstructed spectra of different birds with various 
algorithms. 



in real data. 



Fig. 13: Real data: Top: measurement; bottom: reconstructed 
spectra of different birds with various algorithms compared to 
references. 

3) Data with CASSI-SLM : Now we consider the dataset 
captured by the proposed camera. Since no RGB image is 
available, we only show the results of our algorithm without 
side information. Fig. 15 shows the reconstructed spectrum 
and selected frames for the M&M dataset (N x = 512, N y = 
784, N\ = 30) 8 . As with the original CASSI experiments, we 
use a fiber optic spectrometer (USB2000, Ocean Optics) to 
provide the reference spectra for the targets. It can be seen 
again our algorithm provides better results than TwIST, both 
for images and spectrum. 

As a last example, we show the reconstructed images with 
our method for the berry data (N x = N y = 2048, N\ = 40) 
in Figure 16. Notice that the leaf reconstructs are prevalent 
in the green channels (520~590nm), while the berries (red) 
appear in the red portion of the spectrum (610~680nm). 


Fig. 11: Real data: selected channels of reconstructed hy- 
perspectral images. Top row: TwIST, middle row: linearized 
Bregman, bottom row: shrinkage without RGB. Notice that 
only the bottom row provides the clear “apple stem”. 


Yellow Banana 



wavelength 
Blue Stapler 



wavelength 

Fig. 12: Real data: spectra of the reconstructed hyperspectral 
images for the object data. 


V. Conclusions 

We have developed and tested a new Bayesian dictionary 
learning model for blind CS. Specifically, we have demon¬ 
strated high-quality inversion of compressed hyperspectral im¬ 
ages captured by real cameras. Via the global-local shrinkage 
priors, our algorithm imposes compressibility on the dictionary 
coefficients to extract more information from the dictionary 
atoms. The reconstruction quality is improved significantly by 
integrating the compressed measurements with RGB images 
as side information under the blind CS framework. We also 
developed a new compressive hyperspectral imaging camera 
that uses an SLM to perform the spectral coding. Experimental 
results demonstrate the feasibility of the new camera, the 
superior performance of the algorithm, and the benefit of side 
information. 

8 The 30 wavelengths are 450nm, 458nm, 465nm, 473nm, 481.5nm, 
489.5nm, 498nm, 507nm, 516nm, 524.5nm, 532.5nm, 540.5nm, 548.5nm, 
556,5nm, 564.5nm, 572.5nm, 580.5nm, 588.5nm, 596nm, 603.5nm, 611nm, 
618.5nm, 625.8nm, 633.5nm, 641nm, 648.5nm, 656nm, 663.5nm, 671nm, 
678.5nm. 
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Fig. 14: Real data, row 1: reference; and reconstruction with, row 2: shrinkage without the RGB image; row 3: TwIST; row 
4: shrinkage with the RGB image. Notice the blue bird in the first two images for a good spectrum recovery. 



Orange MM 



500 550 600 650 

wavelength 


Yellow MM 



500 550 600 650 

wavelength 



Fig. 15: Real data: reconstructed spectra and images of M&M 
with TwIST and shrinkage without RGB image. 


The original CAS SI camera utilizes a separate disperser 
and encoder; the SLM-CASSI is capable of modulating both 
dimensions with one element. The SLM-CASSI also provides 


the potential of video rate compressive sensing and multi¬ 
frames adaptive sensing using the 60 Hz refresh rate of 
the SLM. SLM-CASSI offers greater flexibility of masking 
functions and an additional RGB camera for side information; 
however, it intertwines spectral and spatial multiplexing capa¬ 
bilities. The coded aperture/disperser architecture used in the 
original CAS SI camera is more reliable, has a smaller form 
factor (i.e. fewer optical parts), and is less expensive, but lacks 
the coding flexibility and ability to use a separate camera for 
side information. 

In this paper, we have shown that by using the RGB image 
as side information in our proposed blind CS framework, 
the reconstruction quality of hyperspectral images can be 
improved significantly. This framework can be generalized 
to other applications and algorithms. For instance, Gaussian 
mixture model based dictionary learning approaches [?], [44]- 
[47] that learn a union of subspaces can also benefit from side 
information. 
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VI. Model and Inference 
The full statistical model is: 
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The posterior density function of model parameters may be 
represented as: 
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A. MCMC Inference 
1) Sampling d^\ 


P(dk\~) 


Vn^ — k 


(X 


N(l*>d k j 

AT 

«p + ^EW 

n=l 


(54) 

(55) 


N 


ao ' Sd^k s k , n i & ny n _ k , 
n = 1 

y n - $„DAs n + '& n dkVkSk,n- 


(56) 

(57) 


3) Sampling r ra : 


/ 1 ! * 

p(r ra |-) oc Ga I a 0 + -AT,& 0 + - ^4, 
V Z fc = l 


n^fe,n^0 


(61) 

4) Sampling a 

p(ot k ,n |—) oc IG ( ./——A-—, A— ), (62) 

yy *&k,n J 

where IG denotes inverse-Gaussian distribution. 
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sian distribution: 

GIG (x;a,b,p) = AA^L-^" 1 exp (~\{ax+ -V , 

2 K p (Vab) V 2 x J 


and K p (0) is the modified Bessel function of the second 
kind 
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* P (0) = l 2 e ~ HP ~ 1 6XP ("2' + TV dt 

Sampling Qf,: 

p(a 0 1-) oc Ga(ci,di); (64) 

Cl = Co + ll^nllo $\kN, (65) 

n 

<h = d 0 + ^^2\\y n -'f? n 'DAs n \\l 

n 

aEE s k,n' r n ( %k,m (66) 

n k 


where \^ n o denotes the number of nonzero entries in 

Vn- 

Sampling 

Vk- 


pKH 

(X A 

(67) 


/ N \ 

-l 

_2 

a vk 

= Ufe + a 0 ^ 

(58) 

B’Vk 

V n=l / 

N 

= a Uk aod k n y n _ k Sk : n- 

(69) 

Sampling 

n— 1 

oc Ga(ei,/i); 

(70) 


e i = e o + x (-^ + — .?)> 

(71) 


s 

II 

+ 
to | 

1M* 

c^Ja 3 

(72) 


2) Sampling s k y. 

p(sk,n \-) OC Af(p Sk , n ,cr 2 Skn ); (58) 

= fn&k,n&0 4" Clor'/T/c (59) 

= a 0 a 2 Sk<n u k d k T ^y n ,_ k . (60) 


/?. Variational Bayesian Inference 

We adopt a mean-field variational Bayesian (VB) inference 
in lieu of its improved runtime. A VB approach attempts to 
approximate the posterior distribution by a simpler distribu¬ 
tion, p(&\Y) « g(0), where Y is the observed data matrix 
and 0 denotes the set of independent latent variables in the 
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model [48]. VB assumes a complete factorization across latent 
variables, g(0) = We define 0 = {19, S', A, e} 

for the purpose of this work. 

Solving for the optimal distribution g*(0) that minimizes 
the distance between p and q effectively estimates the condi¬ 
tional posterior distribution p(0|Y). A commonly-used dis¬ 
tance metric between the two distributions functions is the 
Kullback-Leibler (KL) divergence [49]. We write the KL- 
divergence of p from q as follows: 

KL(gllp) = J & <?(©) In (73) 

which can be simplified to 

lnp(Y) = KL(q\\p)+h(q), (74) 

t(,) = (75, 

We here observe that lnp(Y) is fixed with respect to the vari¬ 
ations in q(S). Therefore, maximizing the Evidence Lower 
Bound (ELBO) L(q) is equivalent to minimizing the KL- 
divergence between the two distributions. This minimal dis¬ 
tance occurs when 

lng*(0) = E[lnp(Y, 0)] + const. (76) 

Assuming a complete factorization across the latent variables 
< 7 ( 0 ) = ru (@i), each parameter in a variational Bayes 
model is independently updated according to 

Qj (©j) oc exp{E^ i [lnp(Y, 0)]}. (77) 

The update equations of VB are straightforward from the 
posterior distribution of Gibbs sampling ((•) represents the 
expectation of the random variable inside): 

N 

(dk) = (<*o)(Sd*)H) ^2{sk,n)^{y n -k), 

n= 1 

{Vn-k) =Vn- *n(D)(A)(s n ): + V n (d k ) {v k ) (s k ,n) ■ 

l N 

(S dfc ) = Pip + (a 0 )K 2 ) J2( S ln)*n* 

\ n =1 

(sk,n) = (a 0 ){<j 2 Skn ){v k ){d k ) T <S>l(y n _ k ), 

(4,n) = M 2 + (a 2 Sk J, 

«,„) = ( { T n){ a k,n){ a o) + (a 0 )Wl)J2K,k) W ^P 
\ P= 1 

(4, k ) = (d P , k f + (4) = k> 2 + «>, 

N 

Wk) = (4 k )( a o)(d k ) T ^2^ly n ,- k {s k ,n), 

n= 1 

( N P 

(%> + («o) J2( S k,n) Y.]K,k) W n,P 

n= 1 p= 1 





ep + 0.5 (K + 1 - j) 

fc + 0.5 

clq H - 0.5 K 

b 0 + 0.5 J2k=l( s k ,n)( a k,n}( a o) ‘ 


(t>k,n)(sl n }(T n }(a 0 )' 


/ a7iK n 


y/2hoKg 0 -i( 


\J a k'n K 9o(^ 2h O a k'n) 
co + 0-5EJI*n||o + 0.5mV 
do + 0.5 En Efc (4,n) ( T «) ( a fe,n 
+0.5E„(lly„-^nDAs n |||) 


(||y n -^„DAs n || 2 ) (84) 

= E [(yn - ^ n DAs n ) T (y„ - f„DAs n )] 

= E[y£y„] - 2E[y£* n DAs„] + E[^AD T >t^ n DA Sn ] 

= v£l/„ - 2y^„(D)(A)(« n ) + E[Tr(^ n DAs n ^AD T ^)] 
= y T n y n - 2y^(D)<A)(s„) + TA([< S „4)<AD T <* n DA)]), 

with Tr(-) denoting the trace of the matrix inside (). 

( S n s n ) = ( S n) { s n ) + 5js n j (85) 


(AD T *^ n DA> 


(^ t )©(D t <^„D), 




(*"' T ) = (j/) (v T ) + diag^,..., ol K \§ 9) 

(D T *^ n D) = <* n D) T (* n D) 

+diag ( * n ’P’ ■ ■ ■ ’ E A.k ^ ».p I 90 ) 

\ P P J 


VII. More Results 

A. Benchmark Images for Inpainting and Denoising 

To demonstrate the general applications of our proposed 
dictionary learning model, we show the denoising and in¬ 
painting results from benchmark color images (8-bits) tested 
in [31], [32]. For inpainting, we show PSNRs of the restored 
images at various observed data ratios (20% means 80% pixels 
are missing) in Table II. Noisy images corrupted with zero- 
mean Gaussian noise with different standard deviations a 
are restored with PSNRs shown in Table III. Our proposed 
algorithm consistently provides the highest PSNR. Inpainting 
results for corrupted images (Figure 18) are presented in 
Figure 19 (PSNRs shown in Table II); denoising results 
for noisy images (corrupted with zero-mean Gaussian noise 
of various standard deviations, Figure 20) are presented in 
Figure 21 (PSNRs shown in Table III). Figure 17 shows an 
example of a dictionary and learned from a noisy image 
using the proposed dictionary learning model (Section VI). 
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Importantly, few iterations of our VB inference are required 
to obtain good results; 20 iterations are used for denoising and 
100 iterations are used for inpainting (~1 second per iteration 
with an image size 256 x 256 x 3 on an i5 CPU with non- 
optimized MATLAB code). 

TABLE II: Inpainting results of color images at various observed 
pixel percentages. We compare BPFA [31] (top row in each cell) with 
the proposed algorithm (bottom row in each cell). 8-bit RGB images 
are used. 


| Horses | Kangaroo 


29.12 

30.44 


31.56 

32.41 


24.59 

25.77 


29.99 

31.15 


29.59 

29.82 


32.02 

33.65 


34.63 

35.23 


27.00 

28.51 


32.52 

33.78 


32.21 

32.29 


36.45 

37.97 


38.88 

39.73 


32.00 

33.67 


37.27 

39.24 


37.34 

37.77 


41.51 

44.10 


42.56 

44.79 


40.73 

43.88 


41.97 

45.29 


42.74 

49.39 


TABLE III: Denoising results of color images at various noise lev¬ 
els; a denotes the noise standard deviations. We compare KSVD [32] 
(top row in each cell), with BPFA [31] (middle row in each cell) and 
the proposed algorithm (bottom row in each cell). 


a 

Castle 

Mushroom 

Train 

Horses 

Kangaroo 


40.37 

39.93 

39.76 

40.09 

39.00 

5 

40.34 

39.73 

39.38 

39.96 

39.00 


41.24 

40.60 

40.45 

40.72 

39.25 


36.24 

35.60 

34.72 

35.43 

34.06 

10 

36.28 

35.70 

34.48 

35.48 

34.21 


37.11 

36.42 

35.42 

36.19 

34.26 


33.98 

33.18 

31.70 

32.76 

31.30 

15 

34.04 

33.41 

31.63 

32.98 

31.68 


34.51 

33.75 

32.68 

33.55 

32.13 


31.19 

30.26 

28.16 

29.81 

28.39 

25 

31.24 

30.62 

28.28 

30.11 

28.86 


31.74 

30.74 

29.71 

30.61 

28.93 
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Fig. 17: Left: dictionary learned from the noisy image “kangaroo” with a = 5 and patch size 7x7x3. Though we set K = 256, only 
147 of these atoms are significant. Right: inferred \uk\ (sorted by absolute values from large to small). 


Data: 20%, PSNR:6.6883 



Data: 20%, PSNR:9.6531 



Data: 30%, PSNR:7.2675 


Data: 30%, PSNR:10.2324 


Data: 50%, PSNR:8.7242 Data: 80%, PSNR:12.6915 



Data: 50%, PSNR:11.6868 





Data: 20%, PSNR:5.0463 


Data: 30%, PSNR:5.63 







Data: 50%, PSNR:7.0843 


Data: 80%, PSNR:11.0771 



Data: 20%, PSNR:6.1984 



Data: 30%, PSNR:6.7766 


Data: 50%, PSNR:8.2329 


Data: 80%, PSNR:12.2135 




Data: 20%, PSNR:9.013 



Data: 30%, PSNR:9.5939 


Data: 50%, PSNR:11.0531 


Data: 80%, PSNR:15.0442 






Fig. 18: Corrupted images with missing values for inpainting. Each row shows one image, each column shows one data ratio (observed 
pixel precentage). 
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Data:20%, PSNR:30.4377 



Data:20%, PSNR:32.4061 



Data:20%, PSNR:25.7715 



Data:20%, PSNR:31.1516 



Data:20%, PSNR:29.8156 



Data:30%, PSNR:33.6536 Data:50%, PSNR:37.9716 



Data:80%, PSNR:44.0961 






Data:30%, PSNR:28.5072 


Data:50%, PSNR:33.6678 


Data:80%, PSNR:43.876 


Data:30%, PSNR:33.7863 


Data:50%, PSNR:39.2406 


Data:80%, PSNR:45.2866 





Data:30%, PSNR:32.2944 



Data:50%, PSNR:37.7691 



Data:80%, PSNR:49.3853 



Fig. 19: Restored images of inpainting; each row shows one image; each column shows one data ratio. 
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o: 5, PSNR:35.0239 


a: 10, PSNR:29.5641 


a: 15, PSNR:26.7463 


a: 25, PSNR:23.3896 


o: 5, PSNR:34.546 


a: 10, PSNR:29.2273 


a: 15, PSNR:25.8706 


o: 25, PSNR:22.7952 






a: 5, PSNR:34.5306 


a: 10, PSNR:28.8054 


a: 15, PSNR:26.2508 


o: 25, PSNR:22.2113 






Fig. 20: Noisy images for denoisng. Each row shows one image; each column shows one noise level, defined by standard deviation a of 
the additive zero-mean Gaussian noise. 
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o: 5, PSNR:40.4457 


a: 10, PSNR:35.4154 


a: 15, PSNR:32.6833 


a: 25, PSNR:29.711 


a: 5, PSNR:40.7155 


a: 10, PSNR:36.1878 


a: 15, PSNR:33.5449 


a: 25, PSNR:30.606 






a: 5, PSNR:39.4463 




a: 25, PSNR:28.9314 



Fig. 21: Denoised images. Each row shows one image; each column shows one standard deviation a. 
















